Toward Scheduling I/O Request of Mapreduce Tasks Based on Markov Model
نویسندگان
چکیده
In Cloud storage of multiple CPU cores, many Mapreduce applications may run in parallel on each compute node and collocate with local Disks storage. These Disks storage are shared by multiple applications that use full CPU power of the node. Each application tends to issue contiguous I/O requests in parallel to the same Disk; however if large number of Mapreduce tasks enters the I/O phase at the same time, the requests from the same task may be interrupted by the requests of other tasks. Then, the I/O nodes receive these requests as non-contiguous way under I/O contention. This interleaved access pattern causes performance degradation for Mapreduce application, this is particularly important when writing intermediate files by multiple tasks in parallel to the shared Disk storage. In order to overcome this problem, we have proposed approach for optimizing write access for Mapreduce application. The contributions of this paper are: (1) analyze the open issues on scheduling access request of Mapreduce workload; (2) propose framework for scheduling and predicting I/O request of Mapreduce application; (3) describe each role of component that intervenes in the scheduling theses I/O request on Block-level of storage server to provide contiguous access.
منابع مشابه
OS4M: Achieving Global Load Balance of MapReduce Workload by Scheduling at the Operation Level
The efficiency of MapReduce is closely related to its load balance. Existing works on MapReduce load balance focus on coarse-grained scheduling. This study concerns finegrained scheduling on MapReduce operations, with each operation representing one invocation of the Map or Reduce function. By default, MapReduce adopts the hash-based method to schedule Reduce operations, which often leads to po...
متن کاملI/O Throttling and Coordination for MapReduce
As a leading framework for data intensive computing, MapReduce has gained enormous popularity in large-scale data analysis. With the increasing adoption of multi/many core platform, more and more MapReduce tasks are now running on the same node and sharing the same storage resources. The concurrency of tasks raises the issue of I/O stream congestion. We have observed significant throughput drop...
متن کاملThe Impact of Deductive, Inductive, and L1-Based Consciousness-Raising Tasks on EFL Learners' Acquisition of the Request Speech Act
The necessity and importance of teaching pragmatics has come to light by many researchers (e.g. Rose & Kasper, 2001). Due to the consensus over the need to teach pragmatic competence, the main issue now centers on the question of how we should teach this competence in the most effective way. Consistent with this line of research, the present study aimed to investigate the effectiveness of deduc...
متن کاملA Pareto-based scheduler for exploring cost-performance trade-offs for MapReduce workloads
In recent years, we are observing an increased demand for processing large amounts of data. The MapReduce programming model has been utilized by major computing companies and has been integrated by novel cyber physical systems (CPS) in order to perform large-scale data processing. However, the problem of efficiently scheduling MapReduce workloads in cluster environments, like Amazon’s EC2, can ...
متن کاملTask Scheduling in Big Data - Review, Research Challenges, and Prospects
In a Big data computing, the processing of data requires a large amount of CPU cycles and network bandwidth and disk I/O. Dataflow is a programming model for processing Big data which consists of tasks organized in a graph structure. Scheduling these tasks is one of the key active research areas which mainly aims to place the tasks on available resources. It is essential to effectively schedule...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015